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SUMMARY 


This  study  was  undertaken  to  assess  the  test-retest  reliability  of  the 
expanded  Medilog  SS-90-III  Sleep  Stager  by  comparing  sleep  stager  scoring  of 
the  same  records  scored  more  than  once.  Subjects  consisted  of  19  normal 
sleepers  who  slept  one  night  at  home,  following  their  normal  routine.  They 
ranged  in  age  from  19.3  to  63.5,  with  mean  =  31.1  and  median  =  24.7.  There 
were  15  males  and  4  females . 

All  sleep  tapes  were  processed  five  consecutive  times  on  the  day  of 
sleep  scoring,  termed  runs  1-5.  Dependent  variables  chosen  for  study 
consisted  of:  total  sleep  period,  actual  sleep  time,  wake  after  sleep  onset 
(greater  and  less  than  120  sec.),  total  movement  time,  total  stage  1,  total 
stage  2,  total  stage  3,  total  stage  4,  total  REM,  percent  of  stage  1,  percent 
of  stage  2,  percent  of  stage  3,  percent  of  stage  4,  percent  of  stage  REM, 
start  to  sleep  onset,  sleep  onset  to  REM,  sleep  onset  to  stage  2,  sleep  onset 
to  stage  3,  and  sleep  onset  to  stage  4. 

Pearson  correlations  and  alpha  coefficients  were  calculated  to  evaluate 
the  reliability  of  each  measure  over  the  five  runs  combined.  Correlations 
between  runs  ranged  between  .878  and  1.00  for  all  measures,  except  sleep 
onset  to  REM  through  sleep  onset  to  stage  4,  where  correlations  ranged  from 
-.003  to  .991.  Alpha  coefficients  ranged  from  .98  to  1.00  for  all  measures, 
including  total  sleep  time,  movement  time,  sleep  onset,  waking  after  sleep 
onset,  and  both  absolute  and  percentage  amounts  of  sleep  stages  1-4  and  REM, 
but  not  including  the  latency  measures.  Alpha  coefficients  for  the  latency 
measures  were  less  reliable,  although  acceptable  for  REM  latency,  marginally 
so  for  stage  2  and  3  latencies,  but  not  acceptable  for  stage  4  latency. 


INTRODUCTION 


Important  technical  advances  in  portable  sleep  recording  and  automated 
EEG  sleep  stage  scoring  methods  have  been  introduced  in  recent  years.  These 
improved  methods  offer  several  economies  for  both  subjects  and  investigators 
by  virtue  of  the  fact  that  sleep  data  are  recorded  on  portable  cassette  tape 
EEG  recorders.  The  recording  electrodes  can  be  applied  almost  any  time  day 
or  night  at  the  technologist's  convenience,  and  the  portability  of  the  equip¬ 
ment  means  that  subjects  can  choose  to  sleep  at  home,  or  in  other,  more 
familiar,  environments.  Further,  the  cassette  tape  can  be  scored  in  as 
little  as  30-60  minutes  and  stored  indefinitely  as  needed  at  minimal  cost  and 
space.  Thus,  this  improved  procedure  has  important  advantages  for  (1) 
improved  subject  acceptance  leading  potentially  to  better  sleep,  (2)  a  more 
flexible  workload  for  the  technologist,  and  (3)  quicker,  yet  reliable  scoring 
of  sleep  records. 

Evaluative  reports  in  the  literature  have  generally  confirmed  these 
advantages.  Sewitch  and  Kupfer  (1,2)  have  reported  a  project  in  which  they 
compared  the  Oxford  Medilog  9000  and  Telediagnostic  systems  with  laboratory 
recordings.  They  concluded  that  there  were  no  differences  in  standard  EEG 
sleep  parameters  recorded  in  either  the  home  or  laboratory.  Hoelscher  et  al. 
(3)  studied  the  usefulness  of  the  Oxford  Medilog  system  in  evaluating  a 
variety  of  sleep/wake  disorders  in  ?  large  clinical  population  and  concluded 
that  "the  Oxford-Medilog  9000  can  be  used  to  evaluate  a  variety  of 
sleep-related  disorders,  results  in  acceptable  recordings  in  90-97%  of  all 
studies,  and  is  well-accepted  by  most  patients"  (p.607).  However,  both 
Sewitch  and  Kupfer  and  Hoelscher  et  al.  noted  difficulties  in  some  cases,  as 
well  as  remaining  questions  in  scoring  details. 

More  recently,  Crawford  (4)  and  Holler  and  Riemer  (5)  both  compared 
visually-scored  versus  Medilog  Sleep  Stager-scored  sleep  records,  in  an 
effort  to  assess  the  accuracy  and  reliability  of  the  automated  method. 
Crawford  found  a  range  of  74  to  89%  agreement  (mean  =  84%  epoch  by  epoch) 
between  manual  and  automated  sleep  scoring  of  20  recordings.  However,  Holler 
and  Riemer  found  consistent  differences  in  (1)  sleep  onset  time  and  (2)  REM 
time,  but  not  any  other  sleep  measures,  between  manual  and  automated  methods 
when  applied  to  their  sample  of  four  sleep  records.  Neither  Crawford  nor 
Holler  and  Riemer  compared  automated  sleep  stager  scoring  o£  the  same  records 
scored  more  than  once,  i.e.,  correlation  of  the  automated  method  with  itself. 
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Further,  while  the  reports  of  these  investigators  do  indicate  an 
acceptable  level  of  reliability  of  automated  sleep  scoring  by  the  Medilog 
Sleep  Stager,  it  should  be  noted  that  the  software  supporting  this  system  has 
been  recently  revised  and  expanded,  thus  requiring  a  revised  and  expanded 
evaluation  of  the  scoring.  The  present  study  was  therefore  undertaken  to 
evaluate  the  reliability  of  the  expanded  Medilog  SS-90-III  Sleep  Stager  by 
comparing  sleep  stager  scoring  of  the  same  records  scored  more  than  once. 

METHODS 

Subjects 

Subjects  consisted  of  21  healthy,  normal  sleepers  recruited  in  the  San 
Diego  area.  Four  subject-night  recordings  were  rejected  because  of  electrode 
failure  or  inability  of  the  subject  to  sleep  more  than  four  hours.  Two  of 
these  were  successfully  rescheduled  for  a  second  night,  and  two  were  dropped 
from  the  study. 

Therefore,  the  final  N  consisted  of  19  subjects,  ranging  in  age  from  19.3 
to  63.5  years,  with  mean  =  31.1  and  median  =  24.7  years.  There  were  15  males 
and  4  females.  Mean  age  of  the  females  was  not  significantly  different  form 
the  males  (means  of  32.7  and  30.9  years,  respectively). 

Procedures 

Electrode  attachment  for  most  subjects  was  scheduled  between  1400-1600  at 
the  sleep  laboratory,  although  in  a  minority  of  cases  this  was  done  at  the 
subject's  work  location.  Subjects  then  slept  at  home,  following  their  normal 
routine,  and  returned  to  the  laboratory  about  0600-0700  for  removal  of  the 
electrodes.  Several  subjects  were  given  a  bottle  of  acetone  to  remove  the 
electrodes  by  themselves  at  home  in  the  morning. 

The  standard  sleep-monitoring  montage  consisted  of  the  following:  one 
channel  of  EEG  (C4  to  opposite  mastoid),  two  channels  of  E0G  (outer  canthus 
to  opposite  mastoid),  and  one  channel  of  EMG  from  the  submental  muscle  of  the 
chin.  Clock  time  was  used  to  define  the  beginning  and  end  of  the  total  sleep 
period  (TSP). 

Sleep  scoring  of  all  tapes  was  conducted  1-2  days  after  the  recording 
night.  All  recommended  procedures  in  the  1987  operator's  manuals  (Medilog 
9000  Replay  and  Display  System  Operator's  Instruction  and  Service  Manual; 
Medilog  SS-90-III  Sleep  Stager  Operator's  Manual)  supplied  by  Oxford  Medical 
Ltd.  were  followed  closely.  All  tapes  were  processed  on  the  Sleep  Stager 
five  consecutive  times  on  the  day  of  sleep  scoring  (i.e.,  five  independent 
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scoring  runs  always  conducted  on  the  same  day),  hereafter  referred  to  as  runs 
1-5. 

The  Sleep  Stager  provides  a  printout  with  a  hypnogram  (see  sample  in 
Crawford  (4)),  plus  the  following  measures:  total  sleep  period  (TSP),  actual 
sleep  time  (AST),  wake  after  sleep  onset  (WASO)  >  120  sec.,  wake  after  sleep 
onset  (WASO)  <  120  sec.,  total  movement  time  (TMT) ,  total  stage  1  (T0T1), 
total  stage  2  (T0T2),  total  stage  3  (T0T3),  total  stage  4  (T0T4),  to'al  REM 
(TOTR),  start  to  sleep  onset  (SSO),  sleep  onset  to  first  REM  (SOR),  sleep 
onset  to  stage  2  (S02),  sleep  onset  to  stage  3  (S03),  sleep  onset  to  stage  4 
) S04 ) . 

Of  this  list,  all  measures  from  AST  to  TOTR  were  provided  in  more  than 
one  unit  of  measurement  or  format,  viz.,  by  (1)  number  of  epochs  (30  secs.), 
(2)  number  of  episodes  (any  occurrence  lasting  one  epoch  or  more),  (3)  number 
of  minutes,  (4)  percent  of  total  sleep  period,  and  (5)  percent  of  actual 
sleep  time  (where  appropriate).  Thus,  because  of  duplication,  it  was 
necessary  to  choose  a  smaller  yet  representative  group  of  dependent  variables 
for  the  present  study.  This  group  is  listed  in  Table  1.  All  time  measures 
are  given  in  minutes.  Latencies  are  calculated  to  the  first  four  contiguous 
epochs  (2  min.)  of  the  stage.  There  is  a  partial  duplication  of  measures 
only  in  that  sleep  stages  1  through  REM  are  listed  by  both  total  time 
(T0T1-T0TR)  and  percent  of  actual  time  (PER1-PERR). 

RESULTS 

Means,  standard  deviations,  and  minimum  and  maximum  values  for  all 
measures  and  all  runs  are  summarized  in  Table  1.  The  minimum  values  listed 
are  the  lowest  of  the  five  minima  of  five  runs,  and  the  maximum  values  are 
the  highest  of  the  five  maxima  of  five  runs. 

Pearson  correlations  between  runs 

Pearson  correlation  coefficients  were  calculated  between  consecutive 
sleep  stage  runs,  which  produced  five  correlation  coefficients  for  each 
dependent  variable.  Runs  were  paired  as  follows:  1-2,  2-3,  3-4,  4-5,  and 
5-1.  The  range  of  these  correlations  between  runs  for  each  dependent  vari¬ 
able  is  given  in  Table  1.  It  can  be  seen  that  the  correlations  between  runs 
for  the  first  16  variables  (TSP  to  SSO)  were  always  between  .878  and  1.00. 
Twenty-nine  out  of  32  correlations  were  ,90  or  above,  and  the  remaining  three 
were  above  .87.  For  the  four  latency  measures,  SOR  to  S04,  the  range  of 
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Table  1:  Summary  of  mean,  standard  deviation,  minimum  and 
range  of  Pearson  correlations  between  runs,  and  , 
coefficient  for  each  of  the  dependent  measures,  i 
combined . 


aximum  value, 
pha  reliability 
1  runs  (1-5) 


MEASURE 

MEAN 

STD  DEV 

MIN 

MAX 

RANGE  OF  C0I 
BETV  RUNS 

S  ALPHA  C0EFF 

TSP 

476.11 

81.28 

330.0 

653.5 

'  . 998-1. 0( 

1.000 

AST 

400.44 

71.32 

265.0 

542.0 

.972-.99( 

.998 

VASOG 

89.83 

71.89 

0.0 

265.0 

.945-994 

.994 

VASOL 

58.75 

29.85 

5.0 

132.0 

. 899-. 98; 

.989 

TMT 

2.75 

4.93 

0.0 

18.0 

.  962—  -  99( 

.996 

T0T1 

119.90 

119.74 

2.0 

422.0 

. 977-. 99' 

.997 

TOT  2 

261.46 

132.40 

0.0 

460.0 

.878-.  99‘, 

.989 

TOT  3 

41.41 

35.12 

0.0 

131.0 

.993-. 99; 

.999 

T0T4 

49.14 

60.99 

0.0 

215.0 

. 997-1. 0( 

1.000 

TOTR 

328.91 

155.24 

59.0 

677.0 

.969-. 99} 

.997 

PERI 

15.49 

15.48 

0.2 

57.8 

.976-. 99/ 

.997 

PER2 

32.30 

15.17 

0.0 

54.8 

.877-. 99; 

.988 

PER  3 

5.01 

4.05 

0.0 

13.4 

.990-. 99f 

.999 

PER4 

5.79 

6.90 

0.0 

19.9 

•996-.99‘ 

1.000 

PERR 

41.39 

17.86 

6.4 

82.9 

.970-. 99; 

.997 

SSO 

15.63 

15.94 

0.0 

72.5 

.944-. 99/ 

.994 

SOR 

56.64 

58.57 

0.0 

432.5 

.779-.  97.' 

.943 

S02 

24.01 

46.87 

0.0 

408.5 

.097-. 82/ 

.645 

S03 

41.49 

78.82 

0.0 

438.0 

.243-.83( 

.761 

S04 

16.13 

36.63 

0.0 

497.0 

-.003-. 99; 

.246 

NOTE: 

N  =  19; 

r  of  .43 

=  p<.05; 

r  of  .55 

=  p<.01.  A 

ireviations:  TSP  = 

total  sleep  period, 

AST  =  actual  sleep 

time,  VASOf 

=  waso  >  120  sec., 

WAS0L  = 

waso  <  120 

sec,  TMT 

=  total  movement  time 

T0T1  =  total  stage 

1,  TOT 2 

=  total  stage  2,  T0T3  =  total 

stage  3,  TC 

\  =  total  stage  4, 

TOTR  =  total  REM, 

PERI  =  % 

stage  1, 

PER2  =  perc 

it  sgate  2,  PER3  = 

percent 

stage  3,  PER4  =  percent  stage  A 

i,  PERR  =  pei 

ent  stage  REM,  SSO 

=  start 

to  sleep  onset,  SOR 

=  sleep  onset  to  REM,  f 

2  =  sleep  onset  to 

stage  2, 

S03  =  sleep  onset  to  stage  3, 

S04  =  sleep 

nset  to  stage  4. 

Pearson  correlations  were  from  -.003  to  .991,  and  all  o  the  three  minimum 
correlations  listed  between  runs  for  these  measures  fail  I  to  reach  statis¬ 
tical  significance  at  p<.05. 
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Alpha  reliability  coefficients 

Alpha  coefficients,  using  a  method  described  by  Cronbach  (6),  were  calcu¬ 
lated  in  order  to  evaluate  the  reliability  of  each  measure  over  the  five  runs 
combined,  based  on  the  ratio  of  the  variance  of  each  individual  measure  to 
the  variance  of  the  composite  of  the  five  runs.  The  resulting  alpha  coeffic¬ 
ients  are  given  in  Table  1.  It  can  be  seen  that  the  alpha  coefficients  for 
the  first  126  measures,  TSP  to  SSO,  are  consistently  .98  to  1.00,  indicating 
very  high  reliability.  For  the  latency  measures  (SOR  to  S04),  however,  the 
alpha  coefficients  decline  from  .94  to  .24,  respectively,  the  latter  failing 
to  reach  statistical  significance  in  the  case  of  S04.  While  the  alpha  coef¬ 
ficients  for  S04,  S02,  and  S03  were  significant  at  p<.01,  one  might  still 
question  whether  the  correlations  of  .64  and  .76  for  S02  and  S03,  respective¬ 
ly,  are  acceptable  reliabilities  for  an  automated  scoring  procedure. 

DISCUSSION 

The  primary  findings  of  this  study  may  be  summarized  as  follows:  (1) 
Medilog  SS-90-III  Sleep  Stager  scoring  of  most  sleep  measures  was  highly 
reliable,  with  alpha  coefficients  ranging  from  .98  to  1.00  for  measures 
including  total  sleep  time,  movement  time,  sleep  onset,  waking  after  sleep 
onset,  and  both  absolute  and  percentage  amounts  of  sleep  stages  1-4  and  REM; 
(2)  scoring  of  latency  measures  was  less  reliable,  although  certainly  accept¬ 
able  for  REM  latency  (SOR),  and  marginally  acceptable  for  stage  2  and  3 
latencies  (S02  and  S03),  but  not  acceptable  for  an  automated  scoring  of  stage 
4  latency  (S04). 

While  direct  comparison  with  previous  reports  is  difficult  due  to  differ¬ 
ences  in  procedures  and  methods  of  analysis,  it  would  appear  that  our 
findings  with  the  Medilog  SS-90-III  Sleep  Stager  represent  a  very  consider¬ 
able  improvement  over  the  results  reported  by  both  Crawford  (4)  and  Holler 
and  Riemer  (5).  Crawford  found  an  average  74%  agreement  between  manual  and 
machine  scoring  across  epochs  for  all  measures,  clearly  suggesting  a  lower 
reliability,  although  she  was  not  assessing  retest  reliability.  Similarly, 
Holler  and  Riemer  reported  consistent  error  in  measures  of  sleep  onset  and 
REM  time,  although  neither  of  these  variables  appeared  as  problems  in  our 
data.  While  we  did  not  compare  manual  and  automated  scoring  provides  a 
benchmark  of  retest  reliability  to  which  the  manual  method  can  only  aspire. 

Clear  exceptions  to  this  statement  were  the  latency  measures,  S02.  S03, 
and  S04  -  especially  the  latter  -  where  alpha  coefficients  were  either 
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marginally  or  unacceptably  low.  Reasons  for  this  difficulty  are  undoubtedly 
that  the  stager  defined  latencies  as  the  first  four  contiguous  epochs  of  a 
given  stage.  Thus,  for  example,  in  measuring  stage  4  onset,  the  scoring 
sequence  3444434443  would  count,  whereas  3444334443  would  not,  even  though 
the  overall  accuracy  of  scoring  was  consistently  high.  Whenever  the  sleep 
stager  missed  the  onset  of  a  stage  in  this  manner,  and  the  subject  did  not 
show  that  stage  again  at  all  or  until  much  later,  then  the  latency  value  was 
greatly  increased.  Clearly,  this  effect  would  be  larger  in  the  case  of  a 
sleep  stage  that  occurred  most  frequently,  as  was  true  of  stage  4  sleep.  Of 
the  total  of  285  reports  of  latency  to  onset  sleep  stages  2,  3,  and  4,  such 
shifts  were  observed  in  10  cases,  or  3.5%.  Of  this  10,  three  involved  stage 
2,  three  involved  stage  3,  and  four  involved  stage  4. 
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